3D Computer Vision (2024/25)¶

Exercise¶

Upload: 30.10.2024 (11:30)

Deadline: 16.01.2025 (23:59)

Our Group¶

Submitted by Group 14:

  • Saptarshi Bhattacharya
  • Rohan Gadgil
  • Anushree Bajaj
  • Pushkar Santosh Marathe

By submitting this exercise, you confirm the following:

  • All people listed above contributed to this solution
  • No other people were involved in this solution
  • No contents of this solution were copied from others (this includes people, large language models, websites, etc.)

Submission¶

Please hand in a single .zip file named according to the pattern "groupXX" (e.g. group00). The contents of the .zip should be as follows:

  • folder with the same name as the .zip file
    • .ipynb file
    • .html export of .ipynb with all the outputs you got
    • data folder containing necessary files to run the code

I.e.

  1. unzip the provided project.zip file
  2. rename folder "project" according to the pattern "groupXX"
  3. solve task inside .ipynb file
  4. export notebook as .html (File > Download as > HTML)
  5. zip folder groupXX
  6. submit groupXX.zip

Final Presentation¶

You will be required to present your solution in a 20 minute presentation, which includes:

  • Problem Overview
  • Solution Overview (e.g. pseudo code, mathematical formulas, visualizations)
  • Describe challenges & optimizations

After the presentation, there will be 10 minutes of questions and answers about your work.

3D Scene Reconstruction¶

Task Overview¶

Your task in this exercise is to do a dense reconstruction of a scene. This will involve multiple steps that you will encounter and learn about as the semester progresses. You can start implementing individual steps as soon as you learn about them or wait until you have learned more to implement everything together. In the latter case, be mindful that this exercise is designed for an entire semester and the workload is accordingly large.

You will be given the following data:

  • 9 color images of the scene.
    • 8 Bit RGB per pixel.
    • Each image rendered from a different position.
    • The camera used had lens distortion.
  • 9 Depth images of the scene.
    • 8 Bit Grayscale per pixel. The result of dividing the Z-depth by each image's maximum and then multiplying by 255.
    • Each image has the same pose as the corresponding RGB image.
    • The camera used was free of any distortions.
  • 1 Dictionary containing camera calibration parameters.
    • They belong to the camera that was used to render the RGB images.
    • Distortion coefficients are given in the standard [k1, k2, p1, p2, k3] order.
  • 1 Numpy array containing 8 camera transformations.
    • They specify the movements that the camera went through to render all images.
    • I.e. idx 0 specifies the transformation from 00.png to 01.png, idx 1 specifies the transformation from 01.png to 02.png, ...
    • This applies to both RGB and Depth images, as they have the same poses.
  • 1 Numpy array containing 7 features.
    • The features are specified for each of the 9 images.
    • Each feature is a 2D pixel location in "H, W" order, meaning the first value is the height/row in the image and the second width/column.
    • If a feature was not visible, it was entered as [-1, -1].
    • The features are unsorted, meaning that feature idx 0 for 00.png could be corresponding to e.g. feature idx 4 for 01.png.

Solution requirements¶

  • Your code needs to compile, run, and produce an output.
  • Your target output should be a dense point cloud reconstruction (without holes) of the scene.
    • The output should be in the .ply format. We provide a function that can exports a .ply file.
    • You may inspect your .ply outputs in e.g. Meshlab.
    • See the 'Dense Point Cloud' sample image to get an idea of what is possible. (Meshlab screenshot with point shading set to None)
  • Your code should be a general solution.
    • This means that it could run correctly for a different dataset (with same input structure).
    • It should NOT include anything hardcoded specific to this dataset.
  • Your code should not be unnecessarily inefficient.
    • Our sample solution runs in less than 2 minutes total (including point cloud export).
    • If your solution runs for more than 10 minutes, you are being wasteful in some part of your program.

Samples¶

Image Distortion¶

title

Cameras¶

title

Cameras & Features¶

title

Dense Point Cloud¶

title

Imports¶

Please note the following:

  • These are all imports necessary to achieve the sample results.
  • You may remove and/or add other libraries at your own convinience.
  • Using library functions (from the given or other libraries) that bypass necessary computer vision tasks will not be recognized as 'solved'.
    • E.g.: If you need to undistort an image to get to the next step of the solution and use the library function cv2.undistort(), then we will evaluate the undistortion step as 'failed'.
    • E.g.: If you want to draw points in an image (to check your method or visualize in-between steps) and use the library function cv2.circle(), then there is no problem.
    • E.g.: If you need to perform complex mathematical operations and use some numpy function, then there is no problem.
    • E.g.: You do not like a provided utility function and find/know a library function that gives the same outputs from the same inputs, then there is no problem.
In [1]:
import os
import torch
import cv2 as cv
import numpy as np
import matplotlib.pyplot as plt
#make plots interactive:
%matplotlib inline

Prepare Data¶

This should load all available data and also create some output directories. Feel free to rename variables or add additional directories as you see fit.

In [2]:
#Inputs
base_path = os.getcwd()
data_path = os.path.join(base_path, f"data")
img_path = os.path.join(data_path, 'images')
depth_path = os.path.join(data_path, 'depths')
print(f"The project's root path is '{base_path}'.")
print(f"Reading data from '{data_path}'.")
print(f"Image folder: '{img_path}'.")
print(f"Depth folder: '{depth_path}'.")

#Outputs
out_path = os.path.join(base_path, 'output')
ply_path = os.path.join(out_path, 'point_cloud')
os.makedirs(out_path, exist_ok=True)
os.makedirs(ply_path, exist_ok=True)
print(f"\nCreating directory '{out_path}'.")
print(f"Creating directory '{ply_path}'.")

#Load Data
camera_calibration = np.load(os.path.join(data_path, 'camera_calibration.npy'), allow_pickle=True)
camera_calibration = camera_calibration.item()#get dictionary from numpy array struct
given_features = np.load(os.path.join(data_path, 'given_features.npy'), allow_pickle=True)
camera_movement = np.load(os.path.join(data_path, 'camera_movement.npy'), allow_pickle=True)
The project's root path is '/home/hyprshree/Documents/Wi-Se 2024/3d-CV/upstream/3dCV-exercise'.
Reading data from '/home/hyprshree/Documents/Wi-Se 2024/3d-CV/upstream/3dCV-exercise/data'.
Image folder: '/home/hyprshree/Documents/Wi-Se 2024/3d-CV/upstream/3dCV-exercise/data/images'.
Depth folder: '/home/hyprshree/Documents/Wi-Se 2024/3d-CV/upstream/3dCV-exercise/data/depths'.

Creating directory '/home/hyprshree/Documents/Wi-Se 2024/3d-CV/upstream/3dCV-exercise/output'.
Creating directory '/home/hyprshree/Documents/Wi-Se 2024/3d-CV/upstream/3dCV-exercise/output/point_cloud'.

Provided Utility Functions¶

These functions are provided to reduce the complexity of some steps you might encounter. They were involved in the creation of the given samples. However, you do not have to use them and can use other means of achieving the same results.

In [3]:
def sample_image(numpy_image, numpy_sample_grid):
    '''
    This function samples a target image from a source image (numpy_image) based on specified pixel coordinates (numpy_sample_grid).
    Inputs:
        numpy_image: of shape=[H, W, C]. H is the height, W is the width, and C is the color channel of the source image from which color values will be sampled.
        numpy_sample_grid: of shape=[H, W, UV]. H is the height and W is the width of the target image that will be sampled. UV are the pixel locations in the source image from which to sample color values.
    Outputs:
        sampled_image: of shape=[H, W, C]. H is the height, W is the width, and C is the color channel of the target image that was sampled.
    '''
    height, width, _ = numpy_image.shape#[H, W, 3]

    #turn numpy array to torch tensor
    torch_sample_grid = torch.from_numpy(numpy_sample_grid)#[H, W, 2]
    #normalize from range (0, width-1) to (0, 1)
    torch_sample_grid[:, :, 0] = torch_sample_grid[:, :, 0] / (width-1)
    #normalize from range (0, height-1) to (0, 1)
    torch_sample_grid[:, :, 1] = torch_sample_grid[:, :, 1] / (height-1)
    #normalize from range (0, 1) to (-1, 1)
    torch_sample_grid = torch_sample_grid*2 -1

    #transform to necessary shapes
    torch_sample_grid = torch_sample_grid.unsqueeze(0)#[1, H, W, 2]
    torch_image = torch.from_numpy(numpy_image).double().permute(2, 0, 1).unsqueeze(0)#[1, 3, H, W]
    #sample image according to sample grid locations from source image
    sampled_image = torch.nn.functional.grid_sample(torch_image, torch_sample_grid, mode='bilinear', padding_mode='zeros', align_corners=True)
    #transform back to numpy image
    sampled_image = sampled_image.squeeze().permute(1, 2, 0).numpy().astype(np.uint8)#[H, W, 3]
    return sampled_image

def ply_creator(input_3d, rgb_data=None, filename='dummy'):
    ''' Creates a colored point cloud that you can visualise using e.g. Meshlab.
    Inputs:
        input_3d: of shape=[N, 3], each row is the 3D coordinate of each point
        rgb_data(optional): of shape=[N, 3], each row is the rgb color value of each point
        filename: file name for the .ply file to be created 
    '''
    assert (input_3d.ndim==2),"Pass 3d points as NumPointsX3 array "
    pre_text1 = """ply\nformat ascii 1.0"""
    pre_text2 = "element vertex "
    pre_text3 = """property float x\nproperty float y\nproperty float z\n""" 
    if not rgb_data is None:
        pre_text3 += """property uchar red\nproperty uchar green\nproperty uchar blue\n"""
    pre_text3 += """end_header"""
    pre_text22 = pre_text2 + str(input_3d.shape[0])
    pre_text11 = pre_text1
    pre_text33 = pre_text3
    filename = filename + '.ply'
    fid = open(filename, 'w')
    fid.write(pre_text11)
    fid.write('\n')
    fid.write(pre_text22)
    fid.write('\n')
    fid.write(pre_text33)
    fid.write('\n')
    for i in range(input_3d.shape[0]):
        for c in range(3):
            fid.write(str(input_3d[i,c]) + ' ')
        if not rgb_data is None:
            for c in range(3):
                fid.write(str(rgb_data[i,c]) + ' ')
        if i!=input_3d.shape[0]:
            fid.write('\n')
    fid.write("\n")
    fid.close()
    return filename

Step 0: Perceiving the Inputs¶

In [4]:
num_images = given_features.shape[0]
num_features = given_features.shape[1]
print(f"Given Data contains: \n{num_images} Images\n{num_features} Features")
Given Data contains: 
9 Images
7 Features
In [5]:
print("\nCamera Calibration:")
for param, value in camera_calibration.items():
    print(f"\t{param}: {value}")
Camera Calibration:
	distortion_param: [-0.1, 0.02, 0.0, 0.0, -0.01]
	image_height: 551
	image_width: 881
	principal_point: [275.0, 440.0]
	focal_length_mm: 25
	sensor_width_mm: 35
	pixel_ratio: 1.0
	pixel_per_mm: 25.17142857142857
	focal_length_px: 629.2857142857142

Parameters are extracted and stored for later use. The intrinsic parameters of the camera is used to form the intrinsic camera matrix K

In [6]:
dist_coeffs = camera_calibration['distortion_param']
principal_point = camera_calibration['principal_point']
focal_length_px = camera_calibration['focal_length_px']
w = camera_calibration['image_width']
h = camera_calibration['image_height']
K = np.array([
    [focal_length_px, 0, principal_point[1]],
    [0, focal_length_px, principal_point[0]],
    [0, 0, 1]
])
In [7]:
print(f"\nCamera Movement Matrix Shape: {camera_movement.shape}: ")
print(f"Camera Movement from Frame 0 to 1 (homogeneous) = [R|T]:\n{camera_movement[0]}")
Camera Movement Matrix Shape: (8, 4, 4): 
Camera Movement from Frame 0 to 1 (homogeneous) = [R|T]:
[[ 0.70710678 -0.1830127   0.6830127  -3.        ]
 [ 0.1830127   0.98037987  0.0732233  -0.25881905]
 [-0.6830127   0.0732233   0.72672691  0.96592583]
 [ 0.          0.          0.          1.        ]]

The camera movement matrix shape (8,4,4) indicates there are 8 camera movements. This of course aligns with the number of views given(9). Each movement is described by a rotation and translation.

In [8]:
print(f"Given 2D Features: {given_features.shape}")
print(f"Features in view 0(y,x): \n{given_features[0]}")
print(f"Coordinates of feature 0 in each view (y,x): \n{given_features[:, 0]}")
Given 2D Features: (9, 7, 2)
Features in view 0(y,x): 
[[163 616]
 [431 593]
 [380 672]
 [164 660]
 [378 462]
 [280 422]
 [274 650]]
Coordinates of feature 0 in each view (y,x): 
[[163 616]
 [192 722]
 [252 536]
 [ -1  -1]
 [168 693]
 [285 481]
 [187 650]
 [ -1  -1]
 [472 485]]

The shape of the given features matrix indicates there are 9 matrices, one per view. Each view matrix has 7 features denoted by x,y cordinates hence giving it the shape (9,7,2). [-1, -1] Indicates the feature is not in the scene

Visualizing all the views and their corresponding depth maps¶

In [37]:
plt.figure(figsize=(20,45)) 

for i in range(num_images):
    img_path_i = os.path.join(img_path, f'{i:02d}.png')
    depth_path_i = os.path.join(depth_path, f'{i:02d}.png')

    img_i = cv.imread(img_path_i)
    depth_i = cv.imread(depth_path_i, cv.IMREAD_UNCHANGED)

    plt.subplot(num_images, 2, 2 * i + 1)
    plt.imshow(depth_i, cmap='gray')
    plt.title(f'Depth {i:02d}')
    plt.axis('off')

    plt.subplot(num_images, 2, 2 * i + 2)
    plt.imshow(cv.cvtColor(img_i, cv.COLOR_BGR2RGB))
    plt.title(f'Image {i:02d}')
    plt.axis('off')

plt.tight_layout()
plt.show()
No description has been provided for this image

Visualizing the features¶

In [33]:
plt.figure(figsize=(80, 50))
for i in range(num_images):
    plt.subplot(3, 3, i + 1)
    img_path_i = os.path.join(img_path, f'{i:02d}.png')
    img_i = cv.imread(img_path_i)
    img_i = cv.cvtColor(img_i, cv.COLOR_BGR2RGB)
    plt.imshow(img_i)
    features_i = given_features[i]
    for enum, feature in enumerate(features_i):
        if feature[0] != -1 and feature[1] != -1:
            plt.plot(feature[1], feature[0], 'ro',markersize=30)  
    plt.title(f'Image {i:02d} with Features')
    plt.axis('off')
plt.tight_layout()
plt.show()
No description has been provided for this image

Step 1: Removing the Distortions¶

We implement a function "undistort2" based on the intrinsic matrix K and distortion coefficients. With K coordinates from the normalized image plane can be transformed to the pixel plane. (u, v, 1) = K * (x, y, 1) where (u, v) are the pixel coordinates and (x, y) are the normalized image coordinates. Thus inverse: (x, y, 1) = inv(K) * (u, v, 1)

The undistortion is done using following steps:

  • Brown's radial distortion model is applied to the normalized image coordinates (x, y), resulting in the corrected coordinates x_corrected, y_corrrected.
  • The corrected coordinates are transformed back to the pixel plane using the intrinsic matrix K, creating the sampling grid.
  • The distorted image is sampled using the sampling grid to generate the undistorted image.
In [11]:
def undistort2(image, camera_calibration):
    """
    Args:
        image: Input distorted image (H, W, C)
        camera_calibration: Given camera calibration parameters (dict)
    Returns:
        Undistorted image as a NDArray (H, W, C)
    """   
    pixel_coords_samples = np.mgrid[0:w, 0:h].T.reshape(-1, 2)
    
    # Convert pixel coordinates to homogeneous coordinates (u, v) -> (u, v, 1)
    pixel_coords_samples_homogeneuos = np.hstack((pixel_coords_samples, np.ones((pixel_coords_samples.shape[0], 1)))).T

    # Bring pixel coordinates to image plane
    inv_K = np.linalg.inv(K)
    image_plane_coords = inv_K @ pixel_coords_samples_homogeneuos
    image_plane_coords = image_plane_coords[:2, :].T 
    x, y = image_plane_coords[:, 0], image_plane_coords[:, 1] 

    # Applying brown's distortion model
    r2 = x**2 + y**2 
    radial = 1 + dist_coeffs[0] * r2 + dist_coeffs[1] * r2**2 + dist_coeffs[4] * r2**3
    x_corrected = x * radial 
    y_corrected = y * radial 
    corrected_coords = np.stack([x_corrected, y_corrected], axis=-1) 

    # Bring corrected coordinates back to pixel plane
    corrected_coords = np.hstack((corrected_coords, np.ones((corrected_coords.shape[0], 1)))).T 
    sampling_grid = K @ corrected_coords
    sampling_grid = sampling_grid[:2, :].T
    sampling_grid = sampling_grid.reshape(h, w, 2) 

    # Remapping the distorted image to the undistorted image based on the sampling grid
    undistorted_image = sample_image(image, sampling_grid)

    return undistorted_image

Create and store the undistored images in a folder for later use

In [12]:
undistorted_img_path = os.path.join(base_path, 'undistorted_images')
os.makedirs(undistorted_img_path, exist_ok=True)

for i, path in enumerate(os.listdir(img_path)):
    img = cv.imread(os.path.join(img_path, path))
    undistorted_img = undistort2(img, camera_calibration)
    new_path = os.path.join(undistorted_img_path, path)
    cv.imwrite(new_path, undistorted_img)

Display the undistorted images and original images side by side

In [40]:
plt.figure(figsize=(15, 45))
for i in range(num_images):
    img_path_i = os.path.join(undistorted_img_path, f'{i:02d}.png')
    img_i = cv.imread(img_path_i)
    plt.subplot(num_images, 2, 2 * i + 1)
    plt.imshow(cv.cvtColor(img_i, cv.COLOR_BGR2RGB))
    plt.title(f'Undistorted Image {i:02d}')
    plt.axis('off')
    plt.subplot(num_images, 2, 2 * i + 2)
    img_i = cv.imread(os.path.join(img_path, f'{i:02d}.png'))
    plt.imshow(cv.cvtColor(img_i, cv.COLOR_BGR2RGB))
    plt.title(f'Original Image {i:02d}')
    plt.axis('off')
plt.tight_layout()
plt.show()
No description has been provided for this image

Step 2: Matching Features¶

The given features are jumbled, so it is important to match them and store them in a sorted manner.

Before we match the features we write functions to compute a few helpful information.

  • First the "compute_fundamental_matrix" function to compute the Fundamental matrix.
  • Then "compute_transformations_matrix" function to get an array of Transformation matrix for any given view with respect to the remaining views
In [13]:
def compute_fundamental_matrix(K,R, t):
    """
    Args:
        K, Rotation Matrix(3x3), Translation vector
    Returs:
        Fundamental matrix E(3 x 3)
    """
    #translation skew symmetric matrix
    tssm = np.full((3,3), fill_value = 0, dtype = float) 
    tssm[0][1] = -t[2]
    tssm[0][2] = t[1]
    tssm[1][0] = t[2]
    tssm[1][2] = -t[0]
    tssm[2][0] = -t[1]
    tssm[2][1] = t[0]
    
    #Essential matrix 
    E = tssm @ R
    
    # Computing F with E
    F = np.linalg.inv(K).T @ E @ np.linalg.inv(K)
    F = F/ np.linalg.norm(F, 'fro')
    return F
In [14]:
def compute_transformations_matrix(view_idx):
    """
    Args:
        Index of the view
    Returs:
        Array of transmformations of the view
    """
    view = np.eye(4)
    t = [view]
    for i in range(8 - view_idx):
        view = camera_movement[i+view_idx] @ view
        t.append(view)

    view = np.eye(4)
    for i in range(view_idx):
        view = np.linalg.inv(camera_movement[view_idx-1-i]) @ view
        t.insert(0, view)
    return t

Testing the functions with rescpect to the first view

In [15]:
R = camera_movement[0][:3, :3]
t = camera_movement[0][:3, 3]
F_one = compute_fundamental_matrix(K, R,t)
F_one
Out[15]:
array([[ 4.28186168e-22, -9.93501430e-06,  1.05692061e-03],
       [-1.40502320e-05,  4.41177501e-07,  2.44421851e-02],
       [ 1.49471146e-03, -1.50930515e-02, -9.99585628e-01]])
In [16]:
compute_transformations_matrix(0)[1]
Out[16]:
array([[ 0.70710678, -0.1830127 ,  0.6830127 , -3.        ],
       [ 0.1830127 ,  0.98037987,  0.0732233 , -0.25881905],
       [-0.6830127 ,  0.0732233 ,  0.72672691,  0.96592583],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])

Now we define three functions to help match all features.

  • First "feature_matching_using_epipolarlines" function to match one feature of a view to another views with the help of epipolar geometry.
  • Second "match_all_features_per_view" function to iterate over all features in a view.
  • Third "match_features" function itererates over all views calling the second function. Since features might be absent from a view it is important to loop through all possibilities to ensure complete matching.

To keep track of features matched and avoid repetitive computations the "matched_or_not" matrix is maintained with boolean values.

In [17]:
feature_matching_matrix = np.full((9,7,2),-1,dtype="float32")
matched_or_not = np.full((9,7),False,dtype="bool")

for i in range(num_features):
    feature_matching_matrix[0][i] = given_features[0][i]
    matched_or_not[0][i] = True
In [18]:
def feature_matching_using_epipolarlines(feat1, feat2, F, threshold=10):
    if feat1[0] == -1:
        return 1, [-1, -1]

    x = np.array([feat1[1], feat1[0], 1])
    eline = F @ x
    eline /= eline[2]

    y = -eline[2] / eline[1]
    yprime = -(eline[2] + eline[0] * h) / eline[1]
    if abs(y) > (h + w) * 2 or abs(yprime) > (h + w) * 2:
        return 0, [-1, -1]

    imatch, mini, flag = -1, float('inf'), False

    for l, feat in enumerate(feat2):
        if feat[0] == -1:
            continue

        xprime = np.array([feat[1], feat[0], 1])
        distance = abs(eline[0] * xprime[0] + eline[1] * xprime[1] + eline[2]) / np.sqrt(eline[0]**2 + eline[1]**2)

        if distance < mini:
            flag = distance + 1.1 > mini
            mini, imatch = distance, l
        elif abs(distance - mini) < 1.1:
            flag = True

    if flag:
        return 0, [0, 0]
    return (1, feat2[imatch]) if mini < threshold else (1, [-1, -1])
In [19]:
def match_all_features_per_view(view_idx):
    transformation = compute_transformations_matrix(view_idx)
    for i in range(num_images):
        if i == view_idx or all(matched_or_not[i]):
            continue

        R, t = transformation[i][:3, :3], transformation[i][:3, 3]
        F = compute_fundamental_matrix(K, R, t)
        matches, done, flag = [], [], False

        for j in range(num_features):
            feature = feature_matching_using_epipolarlines(feature_matching_matrix[view_idx][j], given_features[i], F)
            if feature[0] == 0 and feature[1][0] == -1:
                flag = True
                break
            matches.append(feature[1] if feature[0] != 0 else [-1, -1])
            done.append(feature[0] != 0)

        if not flag:
            for j, match_done in enumerate(done):
                if not matched_or_not[i][j]:
                    feature_matching_matrix[i][j] = matches[j]
                    matched_or_not[i][j] = match_done
In [20]:
def match_features():
    global num_images 
    for i in range(num_images):
        match_all_features_per_view(i)
        if all(all(matched_or_not[k][j] for j in range(num_features)) for k in range(num_images)):
            break

Finally the otput is a matrix "feature_matching_matrix" with structure similar to that of the given features array. In this matrix the coordinates of the features are arranged in a sorted manner

In [21]:
match_features()
for i, row in enumerate(feature_matching_matrix, start=1):
    print(f"View number: {i}\n{' '.join(map(str, row))}")
View number: 1
[163. 616.] [431. 593.] [380. 672.] [164. 660.] [378. 462.] [280. 422.] [274. 650.]
View number: 2
[183. 694.] [495. 446.] [495. 651.] [192. 722.] [398. 444.] [282. 376.] [346. 704.]
View number: 3
[157. 501.] [384. 621.] [-1. -1.] [157. 540.] [-1. -1.] [291. 480.] [252. 536.]
View number: 4
[199. 614.] [-1. -1.] [409. 690.] [195. 658.] [307. 657.] [285. 422.] [307. 657.]
View number: 5
[168. 693.] [473. 446.] [474. 681.] [157. 722.] [-1. -1.] [285. 374.] [322. 722.]
View number: 6
[207. 500.] [-1. -1.] [-1. -1.] [207. 539.] [-1. -1.] [285. 481.] [303. 539.]
View number: 7
[177. 610.] [448. 574.] [386. 649.] [187. 650.] [383. 460.] [313. 424.] [284. 636.]
View number: 8
[-1. -1.] [-1. -1.] [-1. -1.] [-1. -1.] [505. 444.] [444. 367.] [-1. -1.]
View number: 9
[-1. -1.] [473. 631.] [350. 602.] [-1. -1.] [472. 485.] [472. 485.] [259. 554.]

Visualizing the matched features¶

Each unique feature is assigned a number. It can be clearly seen the same number is used across all views for a feature hence ensuring features are matched correctly.

In [45]:
plt.figure(figsize=(40, 25))
for i in range(num_images):
    plt.subplot(3, 3, i + 1)
    img_path_i = os.path.join(undistorted_img_path, f'{i:02d}.png')
    img_i = cv.imread(img_path_i)
    img_i = cv.cvtColor(img_i, cv.COLOR_BGR2RGB)
    plt.imshow(img_i)
    features_i = feature_matching_matrix[i]
    for enum, feature in enumerate(features_i):
        if feature[0] != -1 and feature[1] != -1:
            plt.plot(feature[1], feature[0], 'ro',markersize=5)  
            plt.text(feature[1], feature[0], f'{enum}', color='red', fontsize=30)
    plt.title(f'View {i} with Matched Features',fontsize=30)
    plt.axis('off')
plt.tight_layout()
plt.show()
No description has been provided for this image

Step 3: Triangulation and Reprojection¶

For feature 0: we have 9 correspondences, so total 9C2 pairs that is 36 pairs. We infact need only one pair, but there maybe measuerment inaccuracies so we will take the average after performing traingulation between all the views for the feature.

In [23]:
def get_projection_matrices():
    
    global camera_movement, K

    projection_matrices = []
    T = np.eye(4)
    for i in range(camera_movement.shape[0] + 1):  
        P = K @ T[:3]
        projection_matrices.append(P) 
        if i < camera_movement.shape[0]:
            T  = camera_movement[i] @ T

    return projection_matrices


projection_matrices = np.array(get_projection_matrices())
In [24]:
def triangulate(feat_ind):
    global given_features, feature_matching_matrix, projection_matrices
    available_views = []

    for i in range(feature_matching_matrix.shape[0]):
        feat_coords = feature_matching_matrix[i][feat_ind]
        if feat_coords[0] == -1 or feat_coords[1] == -1:
            continue  # Skip invalid features

        x = [i, feat_coords[1], feat_coords[0]]
        available_views.append(x)
    
    available_views = np.array(available_views)
    
    A_matrix = []
    for view in available_views:
        P = projection_matrices[int(view[0])]
        A1 = view[1] * P[2] - P[0] 
        A2 = view[2] * P[2] - P[1] 
        A_matrix.append(A1)
        A_matrix.append(A2)
    A_matrix = np.array(A_matrix)

    _, _, V = np.linalg.svd(A_matrix)
    point = V[-1]
    point /= point[-1]
    return point


features_3d = []
for i in range(given_features.shape[1]): 
    point = triangulate(i)
    features_3d.append(point)
features_3d = np.array(features_3d)
print(features_3d)
[[ 1.35201537e+00 -8.62304845e-01  4.81112929e+00  1.00000000e+00]
 [ 9.08391775e-01  9.28692208e-01  3.73654327e+00  1.00000000e+00]
 [ 1.63211365e+00  7.44572998e-01  4.43802616e+00  1.00000000e+00]
 [ 1.60153403e+00 -8.01505795e-01  4.58633108e+00  1.00000000e+00]
 [ 4.15273736e-01  5.40235038e-01  4.19877885e+00  1.00000000e+00]
 [-1.12312272e-01  3.94106547e-02  3.93977091e+00  1.00000000e+00]
 [ 1.59977506e+00 -4.41454716e-03  4.80063840e+00  1.00000000e+00]]
In [25]:
reprojected_features = []
for view_index in range(given_features.shape[0]): 
    rp = []
    for feat_index in range(given_features.shape[1]):
        x = projection_matrices[view_index] @ features_3d[feat_index]
        x /= x[-1]
        x = [x[1], x[0]] # [y,x]
        rp.append(x)
    reprojected_features.append(rp)

reprojected_features = np.array(reprojected_features)

Visualize given features and reprojected features¶

In [46]:
plt.figure(figsize=(40,25))
for i in range(num_images):
    plt.subplot(3, 3, i + 1)
    img_path_i = os.path.join(undistorted_img_path, f'{i:02d}.png')
    img_i = cv.imread(img_path_i)
    img_i = cv.cvtColor(img_i, cv.COLOR_BGR2RGB)
    plt.imshow(img_i)
    features_i = feature_matching_matrix[i]
    reprojected_features_i = reprojected_features[i]
    for enum, feature in enumerate(features_i):
        if feature[0] != -1 and feature[1] != -1:
            plt.plot(feature[1], feature[0], 'ro')  # Note: feature[1] is x (width), feature[0] is y (height)
            plt.text(feature[1], feature[0], f'{enum}', color='red', fontsize=24)
            plt.plot(feature[1], feature[0], 'bo')

    plt.title(f'View {i} with Reprojected Features',fontsize=28)
    plt.axis('off')
plt.tight_layout()
plt.show()
No description has been provided for this image

Step 4: Building the dense cloud¶

  1. View 0 is selected as the reference view.
  2. Reprojected points are computed for each feature in view 0 using the 3D points and camera parameters.
  3. The distance is calculated between each reprojected point and the corresponding actual feature point in the image.
  4. The calculated distances are compared to a predefined threshold. Points with distances within the threshold are identified.
  5. The valid features and their corresponding reprojected points are stored for further processing.
In [27]:
def identify_correct_reprojections(view_index, threshold=10):
    global given_features, features_3d, projection_matrices,feature_matching_matrix

    P = projection_matrices[view_index]
    reprojections = []
    for feature in features_3d:
        x = P @ feature
        x /= x[-1]
        x = [x[1], x[0]]
        
        # check if near any feature in given_features[view_index]
        for feat_index in range(num_features):
            if given_features[view_index][feat_index].sum() == -2:
                continue
            actual = given_features[view_index][feat_index]
            if np.linalg.norm(np.array(x) - np.array(actual)) <= threshold:
                reprojections.append([actual, x, feature])

    return reprojections

reprojections = identify_correct_reprojections(2)
reprojections
Out[27]:
[[array([157, 501], dtype=int32),
  [np.float64(156.8474307413326), np.float64(500.5502209901243)],
  array([ 1.35201537, -0.86230485,  4.81112929,  1.        ])],
 [array([384, 621], dtype=int32),
  [np.float64(384.30255616944476), np.float64(621.4760065465906)],
  array([0.90839178, 0.92869221, 3.73654327, 1.        ])],
 [array([157, 540], dtype=int32),
  [np.float64(156.80514686407514), np.float64(540.6608237814221)],
  array([ 1.60153403, -0.8015058 ,  4.58633108,  1.        ])],
 [array([291, 480], dtype=int32),
  [np.float64(289.89008050211925), np.float64(479.6956561300923)],
  array([-0.11231227,  0.03941065,  3.93977091,  1.        ])],
 [array([252, 536], dtype=int32),
  [np.float64(251.81084957465325), np.float64(536.5997744111278)],
  array([ 1.59977506e+00, -4.41454716e-03,  4.80063840e+00,  1.00000000e+00])]]
In [28]:
def depth_to_3d_with_color(view_index):

    global depth_path, undistorted_img_path, camera_movement

    depth_img = cv.imread(os.path.join(depth_path, f'{view_index:02d}.png'), cv.IMREAD_UNCHANGED)
    clr_img = cv.imread(os.path.join(undistorted_img_path, f'{view_index:02d}.png'))
    correct_projections = identify_correct_reprojections(view_index)

    transformation_matrix = np.eye(4)
    for i in range(view_index):
        transformation_matrix = camera_movement[i] @ transformation_matrix

    inv_transformation_matrix = np.linalg.inv(transformation_matrix)

    max_depth_ = 0
    for feature in correct_projections:
        coords_3d = feature[2] # 1, 4
        coords_3d = transformation_matrix @ coords_3d
        v, u = feature[0]
        depth = depth_img[v, u]

        max_depth = 255 * coords_3d[2] / depth

        max_depth_ += max_depth
    max_depth_ /= len(correct_projections)

    points_in_view_3d = []
    rgb_data = []
    for u in range(depth_img.shape[1]):
        for v in range(depth_img.shape[0]):
            depth = depth_img[v, u]
            if depth == 0 or depth == 255:
                continue # get x, y for color
            actual_depth = max_depth_ * depth / 255 
            x = (u - camera_calibration['principal_point'][1]) * actual_depth / camera_calibration['focal_length_px']
            y = (v - camera_calibration['principal_point'][0]) * actual_depth / camera_calibration['focal_length_px']
            z = actual_depth

            X = np.array([x, y, z, 1])
            X = inv_transformation_matrix @ X
            x, y, z = X[:3]
            points_in_view_3d.append([x, y, z])

            # sample u, v from clr_img
            clr = clr_img[v, u]
            rgb_data.append(clr)


    return points_in_view_3d, rgb_data
In [29]:
out_3d_combined, out_rgb_combined = [], []
for i in range(num_images):
    outp = depth_to_3d_with_color(i)
    outp_3d, outp_rgb = outp

    out_3d_combined.extend(outp_3d)
    out_rgb_combined.extend(outp_rgb)

out_3d_combined = np.array(out_3d_combined)
out_rgb_combined = np.array(out_rgb_combined)

ply_creator(out_3d_combined, out_rgb_combined, 'output/point_cloud/pointcloud')
Out[29]:
'output/point_cloud/pointcloud.ply'

Our output¶

image.png